Overview
The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.
Data
This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’. There are five different types of coupons -- less expensive restaurants (under \$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\\$20 - \$50).
Deliverables
Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons. To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece.
The attributes of this data set include:
User attributes
Contextual attributes
Coupon attributes
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import plotly.express as px
Use the prompts below to get started with your data analysis.
coupons.csv file.data = pd.read_csv('data/coupons.csv')
data.head()
| destination | passanger | weather | temperature | time | coupon | expiration | gender | age | maritalStatus | ... | CoffeeHouse | CarryAway | RestaurantLessThan20 | Restaurant20To50 | toCoupon_GEQ5min | toCoupon_GEQ15min | toCoupon_GEQ25min | direction_same | direction_opp | Y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | No Urgent Place | Alone | Sunny | 55 | 2PM | Restaurant(<20) | 1d | Female | 21 | Unmarried partner | ... | never | NaN | 4~8 | 1~3 | 1 | 0 | 0 | 0 | 1 | 1 |
| 1 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Coffee House | 2h | Female | 21 | Unmarried partner | ... | never | NaN | 4~8 | 1~3 | 1 | 0 | 0 | 0 | 1 | 0 |
| 2 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Carry out & Take away | 2h | Female | 21 | Unmarried partner | ... | never | NaN | 4~8 | 1~3 | 1 | 1 | 0 | 0 | 1 | 1 |
| 3 | No Urgent Place | Friend(s) | Sunny | 80 | 2PM | Coffee House | 2h | Female | 21 | Unmarried partner | ... | never | NaN | 4~8 | 1~3 | 1 | 1 | 0 | 0 | 1 | 0 |
| 4 | No Urgent Place | Friend(s) | Sunny | 80 | 2PM | Coffee House | 1d | Female | 21 | Unmarried partner | ... | never | NaN | 4~8 | 1~3 | 1 | 1 | 0 | 0 | 1 | 0 |
5 rows × 26 columns
data.info() #Checking the Dtype of columns in the DataFrame
<class 'pandas.core.frame.DataFrame'> RangeIndex: 12684 entries, 0 to 12683 Data columns (total 26 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 destination 12684 non-null object 1 passanger 12684 non-null object 2 weather 12684 non-null object 3 temperature 12684 non-null int64 4 time 12684 non-null object 5 coupon 12684 non-null object 6 expiration 12684 non-null object 7 gender 12684 non-null object 8 age 12684 non-null object 9 maritalStatus 12684 non-null object 10 has_children 12684 non-null int64 11 education 12684 non-null object 12 occupation 12684 non-null object 13 income 12684 non-null object 14 car 108 non-null object 15 Bar 12577 non-null object 16 CoffeeHouse 12467 non-null object 17 CarryAway 12533 non-null object 18 RestaurantLessThan20 12554 non-null object 19 Restaurant20To50 12495 non-null object 20 toCoupon_GEQ5min 12684 non-null int64 21 toCoupon_GEQ15min 12684 non-null int64 22 toCoupon_GEQ25min 12684 non-null int64 23 direction_same 12684 non-null int64 24 direction_opp 12684 non-null int64 25 Y 12684 non-null int64 dtypes: int64(8), object(18) memory usage: 2.5+ MB
data.describe()
| temperature | has_children | toCoupon_GEQ5min | toCoupon_GEQ15min | toCoupon_GEQ25min | direction_same | direction_opp | Y | |
|---|---|---|---|---|---|---|---|---|
| count | 12684.000000 | 12684.000000 | 12684.0 | 12684.000000 | 12684.000000 | 12684.000000 | 12684.000000 | 12684.000000 |
| mean | 63.301798 | 0.414144 | 1.0 | 0.561495 | 0.119126 | 0.214759 | 0.785241 | 0.568433 |
| std | 19.154486 | 0.492593 | 0.0 | 0.496224 | 0.323950 | 0.410671 | 0.410671 | 0.495314 |
| min | 30.000000 | 0.000000 | 1.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 55.000000 | 0.000000 | 1.0 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
| 50% | 80.000000 | 0.000000 | 1.0 | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 |
| 75% | 80.000000 | 1.000000 | 1.0 | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 |
| max | 80.000000 | 1.000000 | 1.0 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
data.convert_dtypes().dtypes #Converting DF columns to standard Dtypes
destination string passanger string weather string temperature Int64 time string coupon string expiration string gender string age string maritalStatus string has_children Int64 education string occupation string income string car string Bar string CoffeeHouse string CarryAway string RestaurantLessThan20 string Restaurant20To50 string toCoupon_GEQ5min Int64 toCoupon_GEQ15min Int64 toCoupon_GEQ25min Int64 direction_same Int64 direction_opp Int64 Y Int64 dtype: object
#Convert age from string to int64
data['age'].replace({'50plus':'50', 'below21':'20'}, inplace=True) #Replacing 50plus with 50 and below 21 to 20
data['age'] = data['age'].astype(np.int64)
data.isnull().sum() #Checking null values per column in the DF
destination 0 passanger 0 weather 0 temperature 0 time 0 coupon 0 expiration 0 gender 0 age 0 maritalStatus 0 has_children 0 education 0 occupation 0 income 0 car 12576 Bar 107 CoffeeHouse 217 CarryAway 151 RestaurantLessThan20 130 Restaurant20To50 189 toCoupon_GEQ5min 0 toCoupon_GEQ15min 0 toCoupon_GEQ25min 0 direction_same 0 direction_opp 0 Y 0 dtype: int64
#Checked ALL columns in the DF for special characters etc. For example,
data['destination'].sort_values().unique()
array(['Home', 'No Urgent Place', 'Work'], dtype=object)
#Standardized all column names to lower case and reassigned to a new DF
data1 = data.rename(columns = str.lower)
#Since we are missing 85% of the car information, we can drop this column and assigning it to a new DF
data2 = data1.drop(columns = ['car'])
#Checked for common NaN values in bar, coffeehouse, carryaway, restaurantlessthan20 and restaurant20to50 and dropped the rows
data3 = data2.dropna(subset=['bar', 'coffeehouse', 'carryaway', 'restaurantlessthan20', 'restaurant20to50'], how='all')
#Reduced DF from 12684 to 12642 rows
#Filling NaN values with "No Data"
data4 = data3.replace(to_replace = np.nan, value='No Data')
data4.isnull().sum() #Final check for any NaN values
destination 0 passanger 0 weather 0 temperature 0 time 0 coupon 0 expiration 0 gender 0 age 0 maritalstatus 0 has_children 0 education 0 occupation 0 income 0 bar 0 coffeehouse 0 carryaway 0 restaurantlessthan20 0 restaurant20to50 0 tocoupon_geq5min 0 tocoupon_geq15min 0 tocoupon_geq25min 0 direction_same 0 direction_opp 0 y 0 dtype: int64
coupon_accetped = data4.query("y==1").shape[0] #7181
coupon_all = data4['y'].shape[0] #12642
coupon_accepted_rate = coupon_accetped/coupon_all
coupon_accepted_rate
#56.8% of drivers accept the coupon
0.5680272108843537
sns.countplot(data=data4, x='y')
plt.title('Coupons Not Accepted vs Accetped')
plt.xlabel('Coupon Acceptance: 0-No, 1-Yes')
#The no. of coupons accepted is higher than not accepted
Text(0.5, 0, 'Coupon Acceptance: 0-No, 1-Yes')
coupon column.px.bar(data4, x='coupon', title='Coupon Type and Count', color='coupon')
#Coffee House coupons are the most popular coupon among the other categories
sns.displot(data=data4, x='temperature', kind='hist')
plt.title('Simple histogram showing Temperature')
#Drivers preferred warmer weather than colder weather
Text(0.5, 1.0, 'Simple histogram showing Temperature')
Investigating the Bar Coupons
Now, we will lead you through an exploration of just the bar related coupons.
DataFrame that contains just the bar coupons.data4_bar = data4.query("coupon == 'Bar'")
data4_bar
| destination | passanger | weather | temperature | time | coupon | expiration | gender | age | maritalstatus | ... | coffeehouse | carryaway | restaurantlessthan20 | restaurant20to50 | tocoupon_geq5min | tocoupon_geq15min | tocoupon_geq25min | direction_same | direction_opp | y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9 | No Urgent Place | Kid(s) | Sunny | 80 | 10AM | Bar | 1d | Female | 21 | Unmarried partner | ... | never | No Data | 4~8 | 1~3 | 1 | 1 | 0 | 0 | 1 | 0 |
| 13 | Home | Alone | Sunny | 55 | 6PM | Bar | 1d | Female | 21 | Unmarried partner | ... | never | No Data | 4~8 | 1~3 | 1 | 0 | 0 | 1 | 0 | 1 |
| 17 | Work | Alone | Sunny | 55 | 7AM | Bar | 1d | Female | 21 | Unmarried partner | ... | never | No Data | 4~8 | 1~3 | 1 | 1 | 1 | 0 | 1 | 0 |
| 24 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Bar | 1d | Male | 21 | Single | ... | less1 | 4~8 | 4~8 | less1 | 1 | 0 | 0 | 0 | 1 | 1 |
| 35 | Home | Alone | Sunny | 55 | 6PM | Bar | 1d | Male | 21 | Single | ... | less1 | 4~8 | 4~8 | less1 | 1 | 0 | 0 | 1 | 0 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 12663 | No Urgent Place | Friend(s) | Sunny | 80 | 10PM | Bar | 1d | Male | 26 | Single | ... | never | 1~3 | 4~8 | 1~3 | 1 | 1 | 0 | 0 | 1 | 0 |
| 12664 | No Urgent Place | Friend(s) | Sunny | 55 | 10PM | Bar | 2h | Male | 26 | Single | ... | never | 1~3 | 4~8 | 1~3 | 1 | 1 | 0 | 0 | 1 | 0 |
| 12667 | No Urgent Place | Alone | Rainy | 55 | 10AM | Bar | 1d | Male | 26 | Single | ... | never | 1~3 | 4~8 | 1~3 | 1 | 1 | 0 | 0 | 1 | 0 |
| 12670 | No Urgent Place | Partner | Rainy | 55 | 6PM | Bar | 2h | Male | 26 | Single | ... | never | 1~3 | 4~8 | 1~3 | 1 | 1 | 0 | 0 | 1 | 0 |
| 12682 | Work | Alone | Snowy | 30 | 7AM | Bar | 1d | Male | 26 | Single | ... | never | 1~3 | 4~8 | 1~3 | 1 | 1 | 1 | 0 | 1 | 0 |
2008 rows × 25 columns
bar_coupon_acceptance = data4_bar.query("y == 1").shape[0] #822
bar_coupon_all = data4_bar.shape[0] #2008
bar_coupon_acceptance_rate = bar_coupon_acceptance/bar_coupon_all
bar_coupon_acceptance_rate
# 40.9% of drivers accepted a bar coupon
0.40936254980079684
bar_coupon_acceptance_1to3 = data4_bar.query("y == 1 & bar == '1~3'").shape[0] #257
bar_coupon_acceptance_gt3 = data4_bar.query("y == 1 & bar == ['4~8','gt8']").shape[0] #153
bar_coupon_acceptance_1to3_rate = bar_coupon_acceptance_1to3/bar_coupon_acceptance
bar_coupon_acceptance_gt3_rate = bar_coupon_acceptance_gt3/bar_coupon_acceptance
bar_coupon_acceptance_1to3_rate #31.2%
bar_coupon_acceptance_gt3_rate #18.6%
#Drivers who go to bar 1-3 times accepts more bar coupons than those who goes to bar frequently
0.18613138686131386
bar_goers_over25_coupon_acceptance = data4_bar.query("y==1 & bar != ['less1','never','No Data'] & age>25").shape[0] #292
bar_goers_under25_coupon_acceptance = data4_bar.query("y==1 & bar != ['less1','never','No Data'] & age<25").shape[0] #118
bar_goers_over25_coupon_acceptance_rate = bar_goers_over25_coupon_acceptance/bar_coupon_acceptance
bar_goers_under25_coupon_acceptance_rate = bar_goers_under25_coupon_acceptance/bar_coupon_acceptance
bar_goers_over25_coupon_acceptance_rate #35.5%
bar_goers_under25_coupon_acceptance_rate #14.3%
# 35.5% of drivers over age 25 who goes to a bar at least once a month will accept a bar coupon
# The probability of bar coupon acceptance among this group is higher than the younger counterparts (age less than 25)
0.1435523114355231
px.histogram(data4_bar.query("bar == ['1~3','4~8','gt8']"), x='age', color='y', color_discrete_sequence=px.colors.qualitative.Dark24)
The above chart confirms that younger drivers (under 25) go to bars more often than the other groups. But, when it comes to the bar coupon acceptance rate, driver of age over 25 tends to accept more bar coupons than their younger counterparts. Though the chart trend goes down between ages 30 and under 50, the trend picks back up with older people (over age 50) as they frequent bars and accept more bar coupons.
plt.hist(data4_bar['age'], edgecolor='black', density=True, alpha = 0.05, color='red')
(array([0.08349934, 0. , 0.06507304, 0.0561089 , 0. ,
0.03469456, 0. , 0.02921647, 0.01809429, 0.04664675]),
array([20., 23., 26., 29., 32., 35., 38., 41., 44., 47., 50.]),
<BarContainer object of 10 artists>)
The above chart confirms that the younger drivers (under 25) goes to bar often. But the combined age group 25-30 goes to bar more often than younger drivers. The bar going trend swindles down as the population goes older but picks back up with driversof age above 50.
bar_goers_nokid_occupation_coupon_acceptance = data4_bar.query("y==1 & bar != ['less1','never','No Data'] & passanger!='Kid(s)' & occupation!='Farming Fishing & Forestry'").shape[0] #393
bar_goers_nokid_occupation_coupon_acceptance_rate = bar_goers_nokid_occupation_coupon_acceptance/bar_coupon_acceptance
bar_goers_nokid_occupation_coupon_acceptance_rate #47.8%
# 47.8% of drivers accept bar coupon with no kids and having occupation not in farming/fishing/forestry passanger.
0.4781021897810219
#go to bars more than once a month, had passengers that were not a kid, and were not widowed
bar_goers_nokid_notwidowed = data4_bar.query("y==1 & bar != ['less1','never','No Data'] & passanger!='Kid(s)' & maritalstatus!='Widowed'").shape[0] #393
bar_goers_nokid_notwidowed_rate = bar_goers_nokid_notwidowed/bar_coupon_acceptance
bar_goers_nokid_notwidowed_rate #47.8%
0.4781021897810219
#go to bars more than once a month and are under the age of 30
bar_goers_under30 = data4_bar.query("y==1 & bar!= ['less1','never','No Data'] & age < 30").shape[0] #249
bar_goers_under30_rate = bar_goers_under30/bar_coupon_acceptance
bar_goers_under30_rate #30.7%
0.3029197080291971
#go to cheap restaurants more than 4 times a month and income is less than 50K
data4_cheap_rest = data4.query("coupon == 'Restaurant(<20)'")
cheap_rest_income_less50k = data4_cheap_rest.query("y==1 & restaurantlessthan20!=['less1','1~3','never','No Data'] & income==['$12500 - $24999','$25000 - $37499','$37500 - $49999','Less than $12500']").shape[0] #359
cheap_rest_income_less50k_rate = cheap_rest_income_less50k/data4_cheap_rest.shape[0] #2777
cheap_rest_income_less50k_rate #12.9%
0.12927619733525386
cheap_rest_income_less50k = data4_bar.query("y==1 & restaurantlessthan20!=['less1','1~3','never','No Data'] & income==['$12500 - $24999','$25000 - $37499','$37500 - $49999','Less than $12500']").shape[0] #156
cheap_rest_income_less50k_rate = cheap_rest_income_less50k/bar_coupon_acceptance
cheap_rest_income_less50k_rate #18.9%
0.1897810218978102
Though younger drivers tend to go to bars more often, only those who are between 26 and 27 ages tend to accept more bar coupons than all other groups. The bar-goers accept only about 18.9% of cheap restaurant coupons and an income below $50K doesn’t influence the acceptance of the cheap restaurant coupons.
Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.
#Since Coffeehouse has the highest coupon count, I am choosing this group to understand the coupon acceptance rate.
data4_coffee = data4.query("coupon == 'Coffee House'")
data4_coffee
| destination | passanger | weather | temperature | time | coupon | expiration | gender | age | maritalstatus | ... | coffeehouse | carryaway | restaurantlessthan20 | restaurant20to50 | tocoupon_geq5min | tocoupon_geq15min | tocoupon_geq25min | direction_same | direction_opp | y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Coffee House | 2h | Female | 21 | Unmarried partner | ... | never | No Data | 4~8 | 1~3 | 1 | 0 | 0 | 0 | 1 | 0 |
| 3 | No Urgent Place | Friend(s) | Sunny | 80 | 2PM | Coffee House | 2h | Female | 21 | Unmarried partner | ... | never | No Data | 4~8 | 1~3 | 1 | 1 | 0 | 0 | 1 | 0 |
| 4 | No Urgent Place | Friend(s) | Sunny | 80 | 2PM | Coffee House | 1d | Female | 21 | Unmarried partner | ... | never | No Data | 4~8 | 1~3 | 1 | 1 | 0 | 0 | 1 | 0 |
| 12 | No Urgent Place | Kid(s) | Sunny | 55 | 6PM | Coffee House | 2h | Female | 21 | Unmarried partner | ... | never | No Data | 4~8 | 1~3 | 1 | 1 | 0 | 0 | 1 | 1 |
| 15 | Home | Alone | Sunny | 80 | 6PM | Coffee House | 2h | Female | 21 | Unmarried partner | ... | never | No Data | 4~8 | 1~3 | 1 | 0 | 0 | 0 | 1 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 12656 | Home | Alone | Snowy | 30 | 10PM | Coffee House | 2h | Male | 31 | Married partner | ... | never | 4~8 | gt8 | less1 | 1 | 1 | 0 | 0 | 1 | 0 |
| 12659 | Work | Alone | Snowy | 30 | 7AM | Coffee House | 1d | Male | 31 | Married partner | ... | never | 4~8 | gt8 | less1 | 1 | 0 | 0 | 1 | 0 | 0 |
| 12674 | Home | Alone | Rainy | 55 | 10PM | Coffee House | 2h | Male | 26 | Single | ... | never | 1~3 | 4~8 | 1~3 | 1 | 0 | 0 | 1 | 0 | 0 |
| 12675 | Home | Alone | Snowy | 30 | 10PM | Coffee House | 2h | Male | 26 | Single | ... | never | 1~3 | 4~8 | 1~3 | 1 | 1 | 0 | 0 | 1 | 0 |
| 12681 | Work | Alone | Snowy | 30 | 7AM | Coffee House | 1d | Male | 26 | Single | ... | never | 1~3 | 4~8 | 1~3 | 1 | 0 | 0 | 1 | 0 | 0 |
3975 rows × 25 columns
#What is the proposition of coffee house coupons to be accepted?
coffee_coupon_acceptance = data4_coffee.query("y == 1").shape[0] #1983
coffee_coupon_all = data4_coffee.shape[0] #3975
coffee_coupon_acceptance_rate = coffee_coupon_acceptance/coffee_coupon_all
coffee_coupon_acceptance_rate #49.8%
#There is a 49.8% likelihood that coffee house coupons will be accepted by drivers
0.49886792452830186
#What is the coffee house coupon acceptance rate for those who visited atleast 3 times vs frequented CHs
coffee_coupon_acceptance_1to3 = data4_coffee.query("y == 1 & coffeehouse == '1~3'").shape[0] #675
coffee_coupon_acceptance_gt3 = data4_coffee.query("y == 1 & coffeehouse == ['4~8','gt8']").shape[0] #594
coffee_coupon_acceptance_1to3_rate = coffee_coupon_acceptance_1to3/coffee_coupon_acceptance
coffee_coupon_acceptance_gt3_rate = coffee_coupon_acceptance_gt3/coffee_coupon_acceptance
coffee_coupon_acceptance_1to3_rate #34%
coffee_coupon_acceptance_gt3_rate #29.9%
#Drivers who go to coffee houses 1-3 times accept more coffee house coupons than those who go to coffee houses frequently
0.29954614220877457
#What is the coffee house coupon acceptance rate for those visited atleast once a month and over age 25 than others
coffee_goers_over25_coupon_acceptance = data4_coffee.query("y==1 & coffeehouse != ['less1','never','No Data'] & age>25").shape[0] #867
coffee_goers_under25_coupon_acceptance = data4_coffee.query("y==1 & coffeehouse != ['less1','never','No Data'] & age<25").shape[0] #402
coffee_goers_over25_coupon_acceptance_rate = coffee_goers_over25_coupon_acceptance/coffee_coupon_acceptance
coffee_goers_under25_coupon_acceptance_rate = coffee_goers_under25_coupon_acceptance/coffee_coupon_acceptance
coffee_goers_over25_coupon_acceptance_rate #43.7%
coffee_goers_under25_coupon_acceptance_rate #20.7%
# Drivers who are above age 25 tend to accept more coffee house coupons than younger drivers
0.2027231467473525
px.histogram(data4_coffee.query("coffeehouse == ['1~3','4~8','gt8']"), x='age', color='y', color_discrete_sequence=px.colors.qualitative.D3, title='Coffee House coupon acceptance for frequent visitors')
The above chart confirms that younger people (under 25 age) go to bars more often than people of age over 25. But the chart also sheds the light that the younger drivers accept more coffee house coupons than others. Though the coupon acceptance trend goes down as the population gets older, it trends back up with older people (over age 50) who frequent coffee houses and accepts more coffee house coupons.
#Coffee house coupon acceptance by Martial status
px.histogram(data4_coffee, x='maritalstatus', color='y', color_discrete_sequence=px.colors.qualitative.Dark2, title='Coffee House coupon acceptance by Marital Status')
Single drivers accept more coffee house coupons followed by married partner drivers. Drivers with widowed marital status drivers visit and accept fewer coffee houses and coupons.
#Coffee house coupon acceptance by Income
px.histogram(data4_coffee.sort_values(by='income'), x='income', color='y', color_discrete_sequence=px.colors.qualitative.Alphabet, title='Coffee house coupon acceptance by Income')
Drivers making income between less than $12.5K and $50K tends to accept more coffee house coupons. For those making income between $51K and $99K tends to accept less coffee house coupons. But, the trend switches back for those who makes an income $100K or more to accept more coffee house coupons.
#Coffee house coupon acceptance by Education
px.histogram(data4_coffee, x='education', color='y', color_discrete_sequence=px.colors.qualitative.Plotly, title='Coffee house coupon acceptance by Education')
Drivers with no college degree accepted more coffee house coupons than others. People in high school tend to visit and accept fewer coffee houses and coupons.
#Coffee house coupon acceptance by Gender
px.histogram(data4_coffee, x='gender', color='y', color_discrete_sequence=px.colors.qualitative.Vivid, title='Coffee house coupon acceptance by Gender')
Female drivers tend to go to coffee houses more than male drivers. Though the female drivers tend to reject more coffee house coupons, they still have a high coupon acceptance rate when compared to their male counterparts.
sns.pairplot(data=data4_coffee, hue='y')
<seaborn.axisgrid.PairGrid at 0x23dfbcf32e0>
data4_coffee_corr = data4_coffee.corr()
data4_coffee_corr
| temperature | age | has_children | tocoupon_geq5min | tocoupon_geq15min | tocoupon_geq25min | direction_same | direction_opp | y | |
|---|---|---|---|---|---|---|---|---|---|
| temperature | 1.000000 | -0.020453 | -0.039430 | NaN | -0.160117 | -0.065253 | 0.013308 | -0.013308 | 0.072209 |
| age | -0.020453 | 1.000000 | 0.431939 | NaN | 0.046667 | 0.002879 | -0.046051 | 0.046051 | -0.073097 |
| has_children | -0.039430 | 0.431939 | 1.000000 | NaN | 0.100791 | 0.011209 | -0.039268 | 0.039268 | -0.017303 |
| tocoupon_geq5min | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| tocoupon_geq15min | -0.160117 | 0.046667 | 0.100791 | NaN | 1.000000 | 0.279001 | -0.295472 | 0.295472 | -0.093320 |
| tocoupon_geq25min | -0.065253 | 0.002879 | 0.011209 | NaN | 0.279001 | 1.000000 | -0.140252 | 0.140252 | -0.089406 |
| direction_same | 0.013308 | -0.046051 | -0.039268 | NaN | -0.295472 | -0.140252 | 1.000000 | -1.000000 | 0.030670 |
| direction_opp | -0.013308 | 0.046051 | 0.039268 | NaN | 0.295472 | 0.140252 | -1.000000 | 1.000000 | -0.030670 |
| y | 0.072209 | -0.073097 | -0.017303 | NaN | -0.093320 | -0.089406 | 0.030670 | -0.030670 | 1.000000 |
sns.heatmap(data4_coffee_corr, annot=True)
<AxesSubplot:>